Semiotics and Modeling Computer Classification of Text with Genetic Algorithm : Analysis and first Results

نویسندگان

  • Vincent Rialle
  • Jean-Guy Meunier
  • Sofiane Oussedik
  • Georges Nault
چکیده

Computer engineering proposes the construction of complex systems by dynamic prototyping (Buddle and Bacon, 1992). But this prototyping cannot be inductive and purely considered as a trial an error process. To be successful, one must possess an underlying hypothetical model (Marr, 1982) of what are the functions of the system. If these functions relates to physical tasks, such as sensing temperature, manipulatiing objects, etc., the desired behavior can be observed, and a model can be built. Conversely, if the functions of the system are to be applied to semio-informational tasks, such as language translation, information retrieval, hypertext navigation, text generation, etc., the interpretative behavior is not readily observable. Now, as any other computer systems, these systems are symbol manipulation machines (Newell ,1980). They must also manipulate input and outputs, but, in themselves, these data are semiotic objects, and not physical ones. These systems manipulate objects that have to be interpreted by some cognitive agent. In other words, systems that manipulate physical objects require a model of the physical word, while systems that manipulate informational objects require a semiotic model. In this paper, we illustrate how a semiotic model can help in the conception, the modeling, and the experimentation of a semiotic behavior such as Computer Assisted Reading and Analysis of Text (CARAT), and how this model has called upon the Genetic Algorithm (GA) theory to realize some of its aspects. I. Presentation of CARAT I.1 General presentation Computer Assisted Reading and Analysis of Text is the computer technology that offers readers an asssistance in attaining some aspects of the informational or semiotic content of a text (discursive, lexical, hypertextual, thematic, stylistic, etc.). So, CARAT definitely relates to interpretative actions. There is in no way a robot that reads or understand a text by itself. One the classical models of text interpretation is the philological one1. Through the centuries, thousands of readers, exegetes, and interpreters have practiced this method. Because of the quality of its principles, it has acquired compelling recognition, and the weight of its experience. The basic principle of philological perspective is that one can construct relatively systematic procedures capable to ensure rigor in text interpretation. As a matter of fact, philology is an instanciation of an interpretative semiotic process applied to the processing of textual signs. It takes sets of signs (a text) as its input, then classifies, categorizes them, explores and selects them, and produces a new set of signs the commentaries as its output. This interpretation process can be translated functionally in terms of (a) inscription, (b) classification, (c) exploration, and (d) configuration, of information (Seffah and Meunier, 1994). In its principles, three important dimensions can be emphasized : text reading and analysis is a systematic, dynamic and plastic behavior. Systematicity pertains to the controlled processing of information; dynamicity concerns the interaction of the analyst with the text; and plasticity allows the constant renewed interpretation of the text. In order to respect this particular type of interpretation process, a computer model must rely on an open architecture. It must allow an information processing flow that is systematic dynamic, and plastic. Each processing will be built out of interactive advances and restarts which sometimes are autonomous, sometimes are interrelated, but which all aim at assisting the reader and analyst in penetrating the content of the information. Hence, again a CARAT system is not a robot reader, but a faithful assistant in reading and analyzing texts. In this perspective CARAT is defined as the set of serial or parallel operations which, with the assistance of the computer, construct interpretative paths in which each moment produces a new textual object to be classified, explored and configured. I.2 CARAT and classification

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007